A Method to Construct Reconngurable Simulators from Architecture Templates
نویسندگان
چکیده
Class Derived Class Provides the Interface Fig. 4. The separation between interface and implementation PRORISC/IEEE WORKSHOP, MIERLO NOV. 27-28 1997, THE NETHERLANDS 5 Therefore, we have de ned an abstract class Bu er that provides the method read and write. From the abstract class, we derived new classes, that implement a speci c behavior as shown in gure 4. In case of the handshake bu er we implement a single bu er position, in case of a bounded Fifo we implemented a circular bu er, and in case of an unbounded Fifo we implement a linked list. For the data ow template given in gure 3, we have already implemented many di erent elements together with alternative implementations as given in gure 5. First-come-first-served (Derived Classes) Round Robin Phase Handshake Bounded Fifo Unbounded Fifo Synchronous Asynchronous Packet-switching Sample-switching Elements Controller Router Buffer Pipeline Functional Element Functional Unit (Abstract Classes) Fig. 5. The hierarchy of classes for the architecture elements Recently the standard C++ programming language is supported with a very powerful library called Standard template Library (STL) [9]. This library implements in a very e cient way, using templates, highlevel programming constructs like vectors, lists or maps to name a few. This library turned out to be very useful in the design of the recon gurable simulator. The VHDL language is designed especially to describe hardware [10] and it is interesting to see that objects can play the same role as entities in VHDL to describe structures. The separation made between a structure and its behavior is also carried through in VHDL by allowing each entity to be implemented by possibly di erent architectures. VHDL is a however a speci cation language and not a programming language and it is therefore not possible to use the same language to specify architectures and to construct simulators for them. The language also lacks high-level constructs and the ability to use abstract data-types. C. Language We use a parser to check if a descriptions of an architecture is valid because objects are not capable to check if a structure is still valid. There is e ectively a single at interface provided by all objects to all objects. The parser uses a grammar to describe all allowed patterns (e.g. structures) and it is this grammar that describes the architecture template. The parser takes as its input a textual architecture description, breaks it up in architectural elements (e.g. the keywords) and matches the structure of the elements with its grammar. If this matches, a valid construction is found and at that moment an architecture elements will be instantiated. This process is continued until the complete architecture description is parsed and thus a complete and valid architecture instance is build. For our data ow architecture, we have used the tools Flex [12] and Bison [11]. These tools allow us to build a parser based on the architecture template speci ed using the BNF-format (Bachus-Naur Format). V. Execution Model To simulate an architecture instance, an execution mechanism is needed that provides the rules that govern the order of execution of the various architecture elements. To simulate the performance of the data ow architecture, which is basically a parallel system, it is enough to model two aspects in the architecture; condition synchronization of data and mutual exclusivity of resources. These two e ects can be described by using only processes, semaphores and delays as shown in the PAMELA work of Arjan van Gemund [13]. These three aspects have been implemented as Cfunctions in the PAMELA run-time library (RTL). The RTL is a multi-threading package that has notion of a global virtual time and performs the necessary process scheduling based on this global time. Architecture elements are build in C++ using PAMELA constructs. The implementation of the write method of a bounded Fifo bu er is given in gure 6. It shows how the functionality of the Fifo-bu er is intertwined with PAMELA statements. This coupling shows how the simulation mechanism in uences PRORISC/IEEE WORKSHOP, MIERLO NOV. 27-28 1997, THE NETHERLANDS6the level at which the architecture elements needs tobe modeled.void Fifo::write(Sample* a)fpam P(room);// Is there Room on the Fifo?metricCollector->histogram(token++); //Measurementbuffer[writefifo] = a; // Write in bufferwritefifo = (++writefifo)%capacity;pam delay(1);// It takes 1 clock-cycle to writepam V(data);// Tell there is data availableg Fig. 6. A Write Method of a Bounded Fifo Bu erThe PAMELA processes are instantiated after thearchitecture instances is constructed. Therefore all el-ements and connections between elements are alreadyxed and the PAMELA processes can run withoutfurther need to decode aspects of the architecture.This results in an e cient simulations while still beingexible. The resulting ORAS simulator can execute10.000 instructions per second.The PAMELA approach can be seen as a very ef-cient discrete event simulator. This e ciency is ob-tained by observing only one event per process and nothaving conditional events. As a result, it is di cult tohandle aspects like polling or interrupts. If these as-pects are needed to obtain accurate performance num-bers, a VHDL simulation mechanism can be imple-mented within C++ as presented in [14]. This sim-ulation mechanism is more accurate but also muchslower.VI. Metric-CollectorsTo obtain performance metrics, the architecture isinstrumented with Metric-Collectors. They collectperformance numbers during the execution of the sim-ulator and results are presented at the end of the ex-ecution.Each architecture element is already instrumentedwith speci c metric collectors, but in the instrument-ing phase we can determine if they should be madeactive or not. This way the amount of data producedcan be limited to the performance metrics of interest.In table II some metrics are given for the elementsof the data ow architecture. Special metric collectorsgather information of the complete architecture, likethe number of executed operations or the total exe-cution time in clock-cycles. These two numbers areused to derive the performance metric \parallelism".In the code given in gure 6, a metric-collector is givento determine the lling distribution within a boundedTABLE IIImplemented Metrics for the DifferentElementsElementMetricComm. Structure UtilizationControllerUtilizationBu erFilling distributionRoutersResponse Time ControllerFunctional Unit Utilization, Number of Context SwitchesFunctionalElementUtilization, Pipeline StallsThroughput, Number of OperationsArchitectureNumber of Operations, Total execution timeFifo bu er. Each time a token is written into theBu er, the collector determines how many tokens arepresent in the bu er at that moment. At the end ofan execution each Fifo bu er will present a histogramof the bu er lling. Other collectors observe how longa semaphore was blocking a process. The \responsetime of the controller" is obtained this way by ob-serving how long a Router waits for the controller tobecome available.VII. Design Space ExplorationGiven the recon gurable simulator ORAS, we canperform a design space exploration. This means thatwe need to generate parameters in a structured wayfor the architecture template as shown in gure 1. Forthis purpose we use the versatile scripting languagePerl [15].A Perl-script constructs a number of architecturedescription les for a range of parameters re ectingchanges in architecture e.g. parameters of elements,structure of the architecture, or alternative realiza-tions of elements. The resulting les are taken inby ORAS to construct executable models for the ar-chitecture as explained in this paper. Next the Perlscript executes these simulators for a set of applica-tions and the obtained performance number are storedin a database. After all runs are completed, graphsare constructed from the stored performance numbers.Results obtained using this method are shown in [1],[2].In each design space exploration, enormousamounts of data are generated. A simple experi-ment generates 100 di erent architecture instancesby changing 2 parameters, thus 100 di erent sim-ulation results are produced, containing the perfor-mance numbers, for each application. To managethese amounts of data and their consistency, we have PRORISC/IEEE WORKSHOP, MIERLO NOV. 27-28 1997, THE NETHERLANDS7integrated the Perl-script and thus the recon gurablesimulator ORAS, in the Nelsis design data manage-ment system [16].VIII. ConclusionsUsing ORAS, design space explorations have beenperformed for an programmable IC that will be ap-plied in video applications for the consumer market.ORAS turned out to be very useful for a designerto evaluate design alternatives in application-speci cdata ow architectures. Di erent architectures in-stances are quickly speci ed in the textual descriptionlanguage and the design space exploration process caneasily be automated, using the Perl-scripts and theNelsis design data management system. The obtainedspeed of ORAS of 10.000 instructions per second madeit possible to explore many di erent architectures ina limited amount of time. If we compare this 10.000with the numbers in table I, we may conclude that wehave indeed succeeded in the construction of an e -cient and still recon gurable simulator that is cycleaccurate.The current ORAS simulator is constructed for adata ow architecture. Ongoing research is to inves-tigate if similair concepts can be used for more het-erogenous architecture templates.References[1] Bart Kienhuis, Ed Deprettere, Kees Vissers and Pietervan der Wolf, \A quantitative approach to data ow ar-chitecture design," in Proceedings of the ProRISC/IEEEWorkshop on Circuits, Systems and Signal Processing.STW/ProRISC, Nov. 27 28 1996, pp. 189 { 194.[2] Bart Kienhuis, Ed Deprettere, Kees Vissers, Pieter vander Wolf, \An approach for quantitative analysis ofapplication-speci c data ow architectures," in Proceedingsof 11th Int. Conference of Applications-speci c Systems,Architectures and Processors, Zurich, Switzerland, July 14-16 1997, pp. 338 { 349.[3] Jeroen A.J. Leijten, Jef L. van Meerbergen, Adwin H.Timmer and Jochen A.G. Jess, \Prohid, A Data-DrivenMulti-Processor Architecture for High-Performance DSP,"in Proc. ED&TC;, Mar. 17-2
منابع مشابه
A Method to Construct Recon gurableSimulators from Architecture
| Systems become more and more pro-grammable. To design programmable architectures for these systems, we suggested an approach to measure, via simulation, the performance of a set of applications that are mapped on architecture instances. These architecture instances are derived from an architecture template. This requires that we can construct reconngurable simulators from architecture templat...
متن کاملA Computation Model for Intelligent
Microprocessors and memory systems suuer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the memory system. An Active Page consists of a page of data and a set of associated functions which can operate upon that data. We describe an implementation of Active Pages on RADram (Reconngurable Archit...
متن کاملA Quantitative Analysis of Recon gurable Coprocessors for Multimedia Applications
Recently, computer architectures that combine a reconng-urable (or retargetable) coprocessor with a general-purpose microprocessor have been proposed. These architectures are designed to exploit large amounts of ne grain par-allelism in applications. In this paper, we study the performance of the reconngurable coprocessors on multimedia applications. We compare a Field Programmable Gate Array (...
متن کاملFast Template Placement for Reconngurable Computing Systems
| The advances in the programmable hardware have lead to new architectures, where the hardware can be dynamically adapted to the application to gain better performance. One of many c hallenging problems in realizing a general-purpose reconngurable system is the placement of the modules on the reconngurable functional unit RFU. In reconngurable systems, we are interested both in online template ...
متن کاملTitle: Architecture Design of Reconngurable Pipelined Datapaths Contact Author: Architecture Design of Reconngurable Pipelined Datapaths
This paper examines reconngurable pipelined datapaths (RaPiDs), a new architecture style for computation-intensive applications that bridges the cost/performance gap between general purpose and application speciic architectures. RaPiDs can provide signiicantly higher performance than general purpose processors on a wide range of applications from the areas of video and signal processing, scient...
متن کاملMaking Middleware Communication Architecture Reconfigurable
To deal with emerging network technologies and services, as well as with varying application requirements and dynamic environmental conditions, middleware has to be reconngurable. This requirement highly aaects the middleware communication architecture , which comprises inherent middleware communication mechanisms and the underlying network architecture. In this report, we introduce an architec...
متن کامل